AITopics | bias type

Collaborating Authors

bias type

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LearningDebiasedRepresentationvia DisentangledFeatureAugmentation

Neural Information Processing SystemsFeb-11-2026, 07:41:44 GMT

Thesebiased models suffer from the poor generalization capability when evaluated on unbiased datasets. Existing approaches for debiasing often identify and emphasize those samples withnosuchcorrelation (i.e.,bias-conflicting)without defining the bias type in advance. However, such bias-conflicting samples are significantly scarce in biased datasets, limiting the debiasing capability of these approaches.

artificial intelligence, bias-conflicting sample, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > Texas > Travis County > Austin (0.04)
Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Echoes of AI Harms: A Human-LLM Synergistic Framework for Bias-Driven Harm Anticipation

Tantalaki, Nicoleta, Vei, Sophia, Vakali, Athena

arXiv.org Artificial IntelligenceDec-4-2025

The growing influence of Artificial Intelligence (AI) systems on decision-making in critical domains has exposed their potential to cause significant harms, often rooted in biases embedded across the AI lifecycle. While existing frameworks and taxonomies document bias or harms in isolation, they rarely establish systematic links between specific bias types and the harms they cause, particularly within real-world sociotechnical contexts. Technical fixes proposed to address AI biases are ill-equipped to address them and are typically applied after a system has been developed or deployed, offering limited preventive value. We propose ECHO, a novel framework for proactive AI harm anticipation through the systematic mapping of AI bias types to harm outcomes across diverse stakeholder and domain contexts. ECHO follows a modular workflow encompassing stakeholder identification, vignette-based presentation of biased AI systems, and dual (human-LLM) harm annotation, integrated within ethical matrices for structured interpretation. This human-centered approach enables early-stage detection of bias-to-harm pathways, guiding AI design and governance decisions from the outset. We validate ECHO in two high-stakes domains (disease diagnosis and hiring), revealing domain-specific, bias-to-harm patterns and demonstrating ECHO's potential to support anticipatory governance of AI systems

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2512.03068

Country:

Europe (1.00)
North America > United States > California (0.28)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Law (1.00)
Information Technology > Security & Privacy (0.93)
Health & Medicine > Therapeutic Area (0.92)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Any Large Language Model Can Be a Reliable Judge: Debiasing with a Reasoning-based Bias Detector

Yang, Haoyan, Bao, Runxue, Xiao, Cao, Ma, Jun, Bhatia, Parminder, Gao, Shangqian, Kass-Hout, Taha

arXiv.org Artificial IntelligenceOct-29-2025

LLM-as-a-Judge has emerged as a promising tool for automatically evaluating generated outputs, but its reliability is often undermined by potential biases in judgment. Existing efforts to mitigate these biases face key limitations: in-context learning-based methods fail to address rooted biases due to the evaluator's limited capacity for self-reflection, whereas fine-tuning is not applicable to all evaluator types, especially closed-source models. To address this challenge, we introduce the Reasoning-based Bias Detector (RBD), which is a plug-in module that identifies biased evaluations and generates structured reasoning to guide evaluator self-correction. Rather than modifying the evaluator itself, RBD operates externally and engages in an iterative process of bias detection and feedback-driven revision. To support its development, we design a complete pipeline consisting of biased dataset construction, supervision collection, distilled reasoning-based fine-tuning of RBD, and integration with LLM evaluators. We fine-tune four sizes of RBD models, ranging from 1.5B to 14B, and observe consistent performance improvements across all scales. Experimental results on 4 bias types--verbosity, position, bandwagon, and sentiment--evaluated using 8 LLM evaluators demonstrate RBD's strong effectiveness. For example, the RBD-8B model improves evaluation accuracy by an average of 18.5% and consistency by 10.9%, and surpasses prompting-based baselines and fine-tuned judges by 12.8% and 17.2%, respectively. These results highlight RBD's effectiveness and scalability. Additional experiments further demonstrate its strong generalization across biases and domains, as well as its efficiency.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2505.171

Country:

Europe (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.98)

Add feedback

BTC-SAM: Leveraging LLMs for Generation of Bias Test Cases for Sentiment Analysis Models

Kardkovacs, Zsolt T., Djennane, Lynda, Field, Anna, Benatallah, Boualem, Gaci, Yacine, Casati, Fabio, Gaaloul, Walid

arXiv.org Artificial IntelligenceOct-16-2025

Sentiment Analysis (SA) models harbor inherent social biases that can be harmful in real-world applications. These biases are identified by examining the output of SA models for sentences that only vary in the identity groups of the subjects. Constructing natural, linguistically rich, relevant, and diverse sets of sentences that provide sufficient coverage over the domain is expensive, especially when addressing a wide range of biases: it requires domain experts and/or crowd-sourcing. In this paper, we present a novel bias testing framework, BTC-SAM, which generates high-quality test cases for bias testing in SA models with minimal specification using Large Language Models (LLMs) for the controllable generation of test sentences. Our experiments show that relying on LLMs can provide high linguistic variation and diversity in the test sentences, thereby offering better test coverage compared to base prompting methods even for previously unseen biases.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.24101

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

IndiCASA: A Dataset and Bias Evaluation Framework in LLMs Using Contrastive Embedding Similarity in the Indian Context

S, Santhosh G, S, Akshay Govind, Krishnan, Gokul S, Ravindran, Balaraman, Natarajan, Sriraam

arXiv.org Artificial IntelligenceOct-6-2025

Large Language Models (LLMs) have gained significant traction across critical domains owing to their impressive contextual understanding and generative capabilities. However, their increasing deployment in high stakes applications necessitates rigorous evaluation of embedded biases, particularly in culturally diverse contexts like India where existing embedding-based bias assessment methods often fall short in capturing nuanced stereotypes. We propose an evaluation framework based on a encoder trained using contrastive learning that captures fine-grained bias through embedding similarity. We also introduce a novel dataset - IndiCASA (IndiBias-based Contextually Aligned Stereotypes and Anti-stereotypes) comprising 2,575 human-validated sentences spanning five demographic axes: caste, gender, religion, disability, and socioeconomic status. Our evaluation of multiple open-weight LLMs reveals that all models exhibit some degree of stereotypical bias, with disability related biases being notably persistent, and religion bias generally lower likely due to global debiasing efforts demonstrating the need for fairer model development.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2510.02742

Country: North America > United States (0.46)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (1.00)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

BiasFreeBench: a Benchmark for Mitigating Bias in Large Language Model Responses

Xu, Xin, He, Xunzhi, Zhi, Churan, Chen, Ruizhe, McAuley, Julian, He, Zexue

arXiv.org Artificial IntelligenceOct-2-2025

Existing studies on bias mitigation methods for large language models (LLMs) use diverse baselines and metrics to evaluate debiasing performance, leading to inconsistent comparisons among them. Moreover, their evaluations are mostly based on the comparison between LLMs' probabilities of biased and unbiased contexts, which ignores the gap between such evaluations and real-world use cases where users interact with LLMs by reading model responses and expect fair and safe outputs rather than LLMs' probabilities. To enable consistent evaluation across debiasing methods and bridge this gap, we introduce BiasFreeBench, an empirical benchmark that comprehensively compares eight mainstream bias mitigation techniques (covering four prompting-based and four training-based methods) on two test scenarios (multi-choice QA and open-ended multi-turn QA) by reorganizing existing datasets into a unified query-response setting. We further introduce a response-level metric, Bias-Free Score, to measure the extent to which LLM responses are fair, safe, and anti-stereotypical. Debiasing performances are systematically compared and analyzed across key dimensions: the prompting vs. training paradigm, model size, and generalization of different training strategies to unseen bias types. We will publicly release our benchmark, aiming to establish a unified testbed for bias mitigation research.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2510.00232

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Bridging Fairness and Explainability: Can Input-Based Explanations Promote Fairness in Hate Speech Detection?

Wang, Yifan, Jobanputra, Mayank, Lee, Ji-Ung, Oh, Soyoung, Valera, Isabel, Demberg, Vera

arXiv.org Artificial IntelligenceSep-29-2025

Natural language processing (NLP) models often replicate or amplify social bias from training data, raising concerns about fairness. At the same time, their black-box nature makes it difficult for users to recognize biased predictions and for developers to effectively mitigate them. While some studies suggest that input-based explanations can help detect and mitigate bias, others question their reliability in ensuring fairness. Existing research on explainability in fair NLP has been predominantly qualitative, with limited large-scale quantitative analysis. In this work, we conduct the first systematic study of the relationship between explainability and fairness in hate speech detection, focusing on both encoder- and decoder-only models. We examine three key dimensions: (1) identifying biased predictions, (2) selecting fair models, and (3) mitigating bias during model training. Our findings show that input-based explanations can effectively detect biased predictions and serve as useful supervision for reducing bias during training, but they are unreliable for selecting fair models among candidates.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.22291

Country:

Europe (1.00)
Asia > Middle East > UAE (0.46)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (1.00)

Industry: Transportation (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

EchoBench: Benchmarking Sycophancy in Medical Large Vision-Language Models

Yuan, Botai, Zhou, Yutian, Wang, Yingjie, Huo, Fushuo, Jing, Yongcheng, Shen, Li, Wei, Ying, Shen, Zhiqi, Liu, Ziwei, Zhang, Tianwei, Yang, Jie, Tao, Dacheng

arXiv.org Artificial IntelligenceSep-25-2025

Recent benchmarks for medical Large Vision-Language Models (LVLMs) emphasize leaderboard accuracy, overlooking reliability and safety. We study sycophancy -- models' tendency to uncritically echo user-provided information -- in high-stakes clinical settings. We introduce EchoBench, a benchmark to systematically evaluate sycophancy in medical LVLMs. It contains 2,122 images across 18 departments and 20 modalities with 90 prompts that simulate biased inputs from patients, medical students, and physicians. We evaluate medical-specific, open-source, and proprietary LVLMs. All exhibit substantial sycophancy; the best proprietary model (Claude 3.7 Sonnet) still shows 45.98% sycophancy, and GPT-4.1 reaches 59.15%. Many medical-specific models exceed 95% sycophancy despite only moderate accuracy. Fine-grained analyses by bias type, department, perceptual granularity, and modality identify factors that increase susceptibility. We further show that higher data quality/diversity and stronger domain knowledge reduce sycophancy without harming unbiased accuracy. EchoBench also serves as a testbed for mitigation: simple prompt-level interventions (negative prompting, one-shot, few-shot) produce consistent reductions and motivate training- and decoding-time strategies. Our findings highlight the need for robust evaluation beyond accuracy and provide actionable guidance toward safer, more trustworthy medical LVLMs.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.20146

Country: Asia (0.67)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (1.00)
Health & Medicine > Consumer Health (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

d360a502598a4b64b936683b44a5523a-Paper.pdf

Neural Information Processing SystemsAug-17-2025, 13:37:30 GMT

artificial intelligence, machine learning, natural language, (17 more...)

Neural Information Processing Systems

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Texas > Travis County > Austin (0.04)
Asia > South Korea (0.04)
Asia > China > Hong Kong (0.04)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Vision (0.94)
Information Technology > Artificial Intelligence > Natural Language (0.69)

Add feedback

HESEIA: A community-based dataset for evaluating social biases in large language models, co-designed in real school settings in Latin America

Ivetta, Guido, Gomez, Marcos J., Martinelli, Sofía, Palombini, Pietro, Echeveste, M. Emilia, Mazzeo, Nair Carolina, Busaniche, Beatriz, Benotti, Luciana

arXiv.org Artificial IntelligenceJun-2-2025

Most resources for evaluating social biases in Large Language Models are developed without co-design from the communities affected by these biases, and rarely involve participatory approaches. We introduce HESEIA, a dataset of 46,499 sentences created in a professional development course. The course involved 370 high-school teachers and 5,370 students from 189 Latin-American schools. Unlike existing benchmarks, HESEIA captures intersectional biases across multiple demographic axes and school subjects. It reflects local contexts through the lived experience and pedagogical expertise of educators. Teachers used minimal pairs to create sentences that express stereotypes relevant to their school subjects and communities. We show the dataset diversity in term of demographic axes represented and also in terms of the knowledge areas included. We demonstrate that the dataset contains more stereotypes unrecognized by current LLMs than previous datasets. HESEIA is available to support bias assessments grounded in educational communities.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2505.24712

Country:

South America (1.00)
North America > United States (0.67)

Genre:

Research Report (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.67)
Education > Educational Setting > K-12 Education > Secondary School (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback